64 research outputs found
Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting
Multi-person pose forecasting remains a challenging problem, especially in
modeling fine-grained human body interaction in complex crowd scenarios.
Existing methods typically represent the whole pose sequence as a temporal
series, yet overlook interactive influences among people based on skeletal body
parts. In this paper, we propose a novel Trajectory-Aware Body Interaction
Transformer (TBIFormer) for multi-person pose forecasting via effectively
modeling body part interactions. Specifically, we construct a Temporal Body
Partition Module that transforms all the pose sequences into a Multi-Person
Body-Part sequence to retain spatial and temporal information based on body
semantics. Then, we devise a Social Body Interaction Self-Attention (SBI-MSA)
module, utilizing the transformed sequence to learn body part dynamics for
inter- and intra-individual interactions. Furthermore, different from prior
Euclidean distance-based spatial encodings, we present a novel and efficient
Trajectory-Aware Relative Position Encoding for SBI-MSA to offer discriminative
spatial information and additional interactive clues. On both short- and
long-term horizons, we empirically evaluate our framework on CMU-Mocap,
MuPoTS-3D as well as synthesized datasets (6 ~ 10 persons), and demonstrate
that our method greatly outperforms the state-of-the-art methods. Code will be
made publicly available upon acceptance.Comment: Accepted by CVPR2023, 8 pages, 6 figures. arXiv admin note: text
overlap with arXiv:2208.0922
PCCNet:A Few-Shot Patch-wise Contrastive Colorization Network
Few-shot colorization aims to learn a model to colorize images with little training data. Yet, existing models often fail to keep color consistency due to ignored patch correlations of the images. In this paper, we propose PCCNet, a novel Patch-wise Contrastive Colorization Network to learn color synthesis by measuring the similarities and variations of image patches in two different aspects: inter-image and intra-image. Specifically, for inter-image, we investigate a patch-wise contrastive learning mechanism with positive and negative samples constraint to distinguish color features between patches across images. For intra-image, we explore a new intra-image correlation loss function to measure the similarity distribution which reveals structural relations between patches within an image. Furthermore, we propose a novel color memory loss that improves the accuracy of the memory module to store and retrieve data. Experiments show that our method allows the correctly saturated color to spread naturally over objects and also achieves higher scores in quantitative comparisons with related methods
USTNet:Unsupervised Shape-to-Shape Translation via Disentangled Representations
We propose USTNet, a novel deep learning approach designed for learning shape-to-shape translation from unpaired domains in an unsupervised manner. The core of our approach lies in disentangled representation learning that factors out the discriminative features of 3D shapes into content and style codes. Given input shapes from multiple domains, USTNet disentangles their representation into style codes that contain distinctive traits across domains and content codes that contain domaininvariant traits. By fusing the style and content codes of the target and source shapes, our method enables us to synthesize new shapes that resemble the target style and retain the content features of source shapes. Based on the shared style space, our method facilitates shape interpolation by manipulating the style attributes from different domains. Furthermore, by extending the basic building blocks of our network from two-class to multi-class classification, we adapt USTNet to tackle multi-domain shape-to-shape translation. Experimental results show that our approach can generate realistic and natural translated shapes and that our method leads to improved quantitative evaluation metric results compared to 3DSNet. Codes are available at https://Haoran226.github.io/USTNet
Co-skeletons:Consistent curve skeletons for shape families
We present co-skeletons, a new method that computes consistent curve skeletons for 3D shapes from a given family. We compute co-skeletons in terms of sampling density and semantic relevance, while preserving the desired characteristics of traditional, per-shape curve skeletonization approaches. We take the curve skeletons extracted by traditional approaches for all shapes from a family as input, and compute semantic correlation information of individual skeleton branches to guide an edge-pruning process via skeleton-based descriptors, clustering, and a voting algorithm. Our approach achieves more concise and family-consistent skeletons when compared to traditional per-shape methods. We show the utility of our method by using co-skeletons for shape segmentation and shape blending on real-world data
PCCNet:A Few-Shot Patch-wise Contrastive Colorization Network
Few-shot colorization aims to learn a model to colorize images with little training data. Yet, existing models often fail to keep color consistency due to ignored patch correlations of the images. In this paper, we propose PCCNet, a novel Patch-wise Contrastive Colorization Network to learn color synthesis by measuring the similarities and variations of image patches in two different aspects: inter-image and intra-image. Specifically, for inter-image, we investigate a patch-wise contrastive learning mechanism with positive and negative samples constraint to distinguish color features between patches across images. For intra-image, we explore a new intra-image correlation loss function to measure the similarity distribution which reveals structural relations between patches within an image. Furthermore, we propose a novel color memory loss that improves the accuracy of the memory module to store and retrieve data. Experiments show that our method allows the correctly saturated color to spread naturally over objects and also achieves higher scores in quantitative comparisons with related methods
PCCNet:A Few-Shot Patch-wise Contrastive Colorization Network
Few-shot colorization aims to learn a model to colorize images with little training data. Yet, existing models often fail to keep color consistency due to ignored patch correlations of the images. In this paper, we propose PCCNet, a novel Patch-wise Contrastive Colorization Network to learn color synthesis by measuring the similarities and variations of image patches in two different aspects: inter-image and intra-image. Specifically, for inter-image, we investigate a patch-wise contrastive learning mechanism with positive and negative samples constraint to distinguish color features between patches across images. For intra-image, we explore a new intra-image correlation loss function to measure the similarity distribution which reveals structural relations between patches within an image. Furthermore, we propose a novel color memory loss that improves the accuracy of the memory module to store and retrieve data. Experiments show that our method allows the correctly saturated color to spread naturally over objects and also achieves higher scores in quantitative comparisons with related methods
Learning Weakly Supervised Audio-Visual Violence Detection in Hyperbolic Space
In recent years, the task of weakly supervised audio-visual violence
detection has gained considerable attention. The goal of this task is to
identify violent segments within multimodal data based on video-level labels.
Despite advances in this field, traditional Euclidean neural networks, which
have been used in prior research, encounter difficulties in capturing highly
discriminative representations due to limitations of the feature space. To
overcome this, we propose HyperVD, a novel framework that learns snippet
embeddings in hyperbolic space to improve model discrimination. Our framework
comprises a detour fusion module for multimodal fusion, effectively alleviating
modality inconsistency between audio and visual signals. Additionally, we
contribute two branches of fully hyperbolic graph convolutional networks that
excavate feature similarities and temporal relationships among snippets in
hyperbolic space. By learning snippet representations in this space, the
framework effectively learns semantic discrepancies between violent and normal
events. Extensive experiments on the XD-Violence benchmark demonstrate that our
method outperforms state-of-the-art methods by a sizable margin.Comment: 8 pages, 5 figure
- …